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I The problem The sort of problem that got me interested in this subject is something 
^ ' like this: Consider a recurrence relation such as 

o : 

^! ttn+i = an + {n + 3)nan , ao = l, 

T— I I 

^ ■ whose solutions are integers that grow rapidly with n. (This is a cooked-up example. 
^ . For an indication of a realistic problem, look at [2], especially the formula at the end, 
>• ! and contemplate calculating I40 .) Suppose that: (1) We do not know how to solve the 
I recurrence in closed form, so we want to use a computer to grind out the values of for, 
O ' say, n < 20: 

5^ ! ai = 1, a2 = 5, as = 75, 04 = 6975, = 48845925, .... 

0\ ; 

^ ' (2) We insist on knowing the answers exactly; floating-point numbers of a flxed precision 

are not adequate. 

! The problem is that eventually the numbers will overflow the natural "word size" of 

\ the computer. If integers are represented by 16 bits, the largest signed integer is 2^^ — 1, 

^1 the largest unsigned integer is 2^^ — 1. If "long" (32-bit) integers are used, we can get up 

^ ■ to 2-^^ — 1 unsigned. 

• j-j . What happens when an integer variable overflows depends on the programming lan- 

I guage used. Some systems will give an error message and abort the program. In standard 
I C, the storage of the number "rolls over" like a car's odometer: the most signiflcant dig- 
its are lost, without warning. Programs like Maple and Mathematica do arithmetic with 
integers of (in principle) arbitrary length, but let us assume that we are not using one of 
those (else we would have no story to tell). 

The most obvious response to this situation would be to represent large integers as 
arrays of regular (say 16-bit) integers, and to program the arithmetical operations directly 
on them. Each integer piece is treated exactly like a single digit in hand arithmetic. For 
example, if n = no + 2^^ni and m = mo + 2^^mi , then nm = noWo + 2^^(nimo + 
nomi) + 2"^^nimi ; if we have uq , etc., stored as 32-bit integers (that are actually smaller 
than 2^^), then noWo is a 32-bit integer, and we can break it up as po + 2^^pi , so that 
nm — Po + 2^^(po + niniQ + nQirii) + . Next we can break up the coefficient of 2^^ 

as pi + 2^^p2 and throw p2 into the coefficient of 2^^. And so on. Addition and subtraction 
are easier, but similar; you need to handle carries and borrows between adjacent 16-bit 
parts. 



1 



This method can be programmed straightforwardly enough, but it has some disadvan- 
tages. Multiplying two arrays of length N requires A^^ multiplications of 16-bit numbers. 
Addition and subtraction require only operations, but these must be done sequentially, 
to handle the carries and borrows. This is a disadvantage on a modern parallel computer. 

In contrast, the modular method to be described here requires only N operations for 
input of size N, even for multiplication, and the operations are independent, so they can be 
performed in parallel. (And, most importantly for our present purposes, the mathematical 
theory behind this method is much more interesting!) 

Modular (residue) arithmetic An integer m > 1 defines a partition of Z into equiv- 
alence classes 

[0], [1], [m-1]. 

(For more leisurely introductions to this subject, sec the background reading recommended 
at the end of this article.) Two numbers are "equivalent (or congruent) modulo m" if their 
difference is divisible by m. Thus n G [r] means that n — qm + r with < r < m — 1. 
The modern computer notation is ''^r = n% m" ; r is called the "residue" of n modulo m; 
m is called a "modulus". (If ni and n2 are in the same equivalence class, the traditional 
mathematical notation is "ni = 77-2 modulo m", without any implication that either Uj lies 
in the range of the residues.) 

These classes can be added, subtracted, and multiplied according to definitions 

[a] + [6] = [a + 6], etc. 

We need to show that these operations are well-defined; that is, the result of applying the 
definitions does not depend on which elements a and b of the respective classes are chosen: 
Suppose that a = qam + and h = qbm + r& . Then a + b = (go + Qb)'>TT' + (i^a + i^b)- If 
fa + H > m, we subtract m from it to bring it back into the proper range, and compensate 
by adding 1 to -|- . If -|- < m, we do nothing. In either case, we get the residue 
of the sum as 

ra+b = {ra + rb)%m 

regardless of what g^ and g^ were. 
Similarly, 

ab = {qaQbrn + Vaqb + nqa)m + (r^rfo), 

so Tab = {ran) %m. 

This arithmetical consistency makes it possible to blur the distinctions among a, [a], 
and Ta , representing equivalence classes by residues and understanding all arithmetic to be 
modulo m. The equivalence classes are called "the integers modulo m" ; the set of integers 
modulo m is denoted by . 

The range to m — 1 for the residues is the most obvious one to choose, but actually 
any string of m consecutive integers can be used to represent the equivalence classes. The 
range from \—m/2] to [m/2 — ej, with in the center, is often useful. One should really 
think of Zm as forming a circle ([m] being the same thing as [0]), and the labeling of its 
elements by a particular choice of residues as analogous to a choice of range a < d < a + 27r 
for the angular coordinate on a circle. 
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Observe that odometer-style integer overflow is actuaUy an implementation of arith- 
metic modulo 2^^ (or 2"^^, as the case may be). For unsigned integers, the range of the 
residues is to 2^^ — 1; for signed integers in "two's-complement representation", the range 
is —2^^ to 2^^ — 1. ( "One's-complement representation" is a complication which we shall 
ignore.) 

The Chinese remainder theorem This theorem is called "Chinese" because a nu- 
merical example of it is stated in a Chinese manuscript of circa 300 A.D. (with author Sun 
Tsu), and the general case was stated and proved by Ch'in Chiu-Shao in 1247 A.D. 

Two integers are called "relatively prime" if their greatest common divisor is 1 (in 
other words, their prime factorizations have no number in common). 

Chinese Remainder Theorem: Let mi , . . . , m_R be positive integers that are 
pairwise relatively prime: 

gcd(mj, mfc) = 1 if j k. 

Let M — mim2 ■ ■ ■ niR . For any i?-tuple of integers, {ui, . . . , ur), there is exactly one 
integer u such that 

< u < M and u = Uj modulo nij for each j. 

(In particular, if Uj is restricted to the range to rrij — 1, then Uj = u% rrij . The same 
statement applies if the ranges of the residues are shifted, or if the range of u is shifted in 
a similar manner.) 

Before proving the theorem, let's look at an example. Suppose that mi = 12 and 
1712 = 7. Thus M = 84. For any number u between and 83, say u = 49, we can calculate 
the residues: 

ui = 1 because 49 = 4 x 12 -|- 1, 

U2 = because 49 = 7 x 7. 

In fact, it is easy to see that all the numbers fit into a table, labeled by ui horizontally and 
U2 vertically, starting with in the upper left corner and moving diagonally downward, 
then "wrapping" to the top when the bottom is reached, and wrapping to the left whenever 
the right side is reached: 
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In other words, although we can continue to think of Zis4 as a circle, it also has the two- 
dimensional structure of a torus, the "product" of two circles of circumferences 12 and 7: 

ZgA = Zi2 X Zy . 
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This example has a musical interpretation. Think of the integers modulo 84 as labeling 
musical notes, or the keys on a piano. (A standard piano has 88 keys, so there is an overlap 
of 4 keys at the ends.) 



C» G» D« A» 




C G D A E B F C 



The ratio of the frequencies of the sounds produced by adjacent keys is V2, so when we 
move up by 12 keys we reach the note of frequency twice the one we started from — an 
interval of one "octave". If we started from a C, this note is also a C. On the other hand, 
if we move up by 7 keys, we reach a note of frequency almost exactly | of the original one. 
This is called an interval of a "fifth". If we started from C (as "tonic"), this (so-called 
"dominant") note is a G. If we move up another 7 keys from G, wc reach D; and continuing 
in this way we reach every one of the 12 notes of the scale exactly once before arriving 
back at a C. This structure is called "the circle of fifths". 

(You may wonder, since the basic numbers involved here are 12 and 7, why the intervals 
are called "octaves" and "fifths". For complicated historical reasons, musicians have two 
strange rules for counting the notes between C and G or C: (1) don't count the black 
keys; (2) count both endpoints (in mathematical terms, both [0] and [m]). The result is 
to convert the 12 to an 8 and the 7 to a 5.) 

It is important to understand that the existence of the circle of fifths depends crucially 
on the fact that 12 and 7 are relatively prime. If we replaced the 7 by a 6, we would go 
from C to F* to C again, without ever visiting the other notes. If we replaced 7 by 8 or 
9 (not a divisor of 12 but not relatively prime to it either), the cycle would be longer but 
would still end prematurely. 

For an example of modular arithmetic in Z84, note that the residues of v = 2 are 
{V11V2) = (2,2). As previously observed, the residues of 49 are {ui,U2) = (1,0). Then 

{uu U2) + {vu V2) = {{ui + vi) % 12, {u2 + V2) % 7) = (3, 2), 

and we see from the table that these are the residues of 51, which is indeed 49+2. Similarly, 

(til, U2) ■ (f 1, f2) = {uivi % 12, U2V2 % 7) = (2, 0), 

which is the residue representation of 14; well, 49 ■ 2 = 98 = 84 + 14, so 49 ■ 2 does equal 
14 modulo 84. 

For another example of the Chinese remainder theorem, see Example 14.18 in [4], 
especially the table on p. 656, where the integers modulo M = 30 are represented by their 
residues modulo 2, 3, and 5. 

A quick proof of the theorem is possible: We can restate the conclusion this way: 
For every list of residues, (ui, . . . jUr), there is a unique u e Zm such that u%mj = uj . 
If u and u' are integers such that u = u' modulo ruj for 1 < j < R, then tt — tt' is a 
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multiple of each rrij . Therefore, w — is a multiple of M. (This is the step where relative 
primality is used.) This shows that the residues uniquely determine tt as a member of 
Zm ■ In other words, we have shown that the function mapping u E 7jm into its list of 
residues, (ui, . . . , ur) G Zj„^ x • • • x Zj„^, is injective. It remains to show that the function 
is surjective: every i?-tuple of residues corresponds to some u e Zm • But the number of 
i?-tuples is mi • • • mn = M, the same as the number of integers modulo M, so there is 
exactly one pigeon in every pigeonhole: the two sets are in one-to-one correspondence. 

As we saw earlier in the example, addition, subtraction, and multiplication modulo 
M can be carried out by performing the corresponding operations on the residues modulo 
the respective rrij ; for example, 

(til, ...,ur) + {vi,...,vr) = {{ui + vi)%mi,...,{uR + VR)%mR) 

yields the residue representation of u + v modulo M. We can think of the representation 
of an integer (less than M) by a string of residues as analogous to the representation of 
an integer by a string of decimal digits, with the important difference that under addi- 
tion, subtraction, and multiplication the residues in each "place" combine only among 
themselves; there is no "carry" problem! 

Implementation The practical point of this discussion should now be clear: We choose 
R relatively prime integers, rrij , all fairly big but smaller than 2^^, and we do computer 
arithmetic with arrays of R integers, {ui, . . . ,ur), modulo {mi, . . . jUIr). These arrays 
represent all the integers from up to one less than M = mi ■ ■ ■ itir , which may be much 
larger than 2^^. (In practice one may want to accommodate negative numbers by using the 
symmetric range, [— M/2] < u < \_M/2—e\ , but for simplicity I shall not discuss the details 
of that.) This representation of large integers has the advantage previously advertised, that 
arithmetic can be performed in parallel on the residue components, without carries and 
borrows and without cross terms in multiplication. (It has some disadvantages, too, which 
will be mentioned later.) 

As a quick toy example, let's calculate 7! with respect to the moduli 13, 11, 9, and 
7 — without ever encountering a number larger than 256 = 2^. (This is possible because 
the moduli are all smaller than 2^.) The residues of 7 with respect to these moduli are 
(7,7,7,0). For any smaller integer n the moduli are {n,n,n,n). However, it is more 
efficient to combine some of them: 

2 X 5 = (10,10,1,3), 3 X 4= (12,1,3,5). 

Then 

2x3x4x5 = (120, 10, 3, 15) = (3, 10, 3, 1); 
6! = (18,60, 18,6) = (5,5,0,6); 
7! = (35,35,0,0) = (9,2,0,0). 
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To check this, note that 



7! = 5040 

= 387 X 13 + 9 
= 458 X 11 + 2 
= 560 X 9 + 
= 6! X 7 + 0. 

(Later I will explain how to find the 5040 from the (9, 2, 0, 0) without knowing the answer 
beforehand.) 

Several years ago an undergraduate student, Davin Potts, wrote under my direction 
some C++ functions to implement residue arithmetic for large integers [9]. We recognized 
immediately that the problem was a natural one for object-oriented programming — specif- 
ically for "polymorphism", or "operator overloading". The point is that in ordinary C code, 
when large integers are represented by arrays, one can no longer write operations like 

c = a + b; . 

One needs to write something like 

c = sum(a, b); , 

where sum is the function (subroutine) one has written to carry out the addition. Even 
worse, if the integer is represented by a C structure that must be passed by reference, one 
might have 

add(&a, &b, pc); 
c = *pc; . 

In C++, instead, we can define the i?-tuples of residues to be a "class" named residue_int 
and can define "+", etc., to act on that class in the usual way: 

residue_int a, b, c; 
c = a + b; . 

We also need to be able to add or multiply a residue_int by an ordinary integer; this is 
done by defining "friend" functions in C++. 

The default moduli in Davin's program are the 5 largest prime numbers less than 2^^: 

65449, 65479, 65497, 65519, 65521. 

Their product is 1,204,964,463,846,332,731,259,513 ^ 10^^, so we can represent all 
unsigned integers less than that value. The program is fiexible enough to accommodate 
various numbers of moduli. The residues themselves are represented as long integers (32 
bits), so that they can be multiplied without overflow. 
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Reconstructing an integer from its residues Unfortunately, the list of residues of 
a large integer is not intuitively meaningful to a human reader. After a calculation one 
will probably want to convert from residues back into standard decimal notation. In the 
example of Z12 x Z7 we could do that by reading the rectangular table, but that method 
is not practical if the moduli are large and there are more than 2 of them. 

A good reconstruction algorithm requires some more number theory. I'll explain it in 
the context of our example, mi = 12 and 1712 = 7. 

First we need to find a number c that is the reciprocal (inverse) of one modulus 
modulo the other modulus. I claim that 3 is the reciprocal of 12 modulo 7 (in other words, 
[12] ~^ = [3] in the number system Z7). The proof is: 

12 x3 = 36 = 5x7+1 = 1 modulo 7. 

(As an exercise, show that 7 is its own reciprocal modulo 12. The calculation is hidden 
in the previous text. Both reciprocals can also be read off from the table. Later I will 
mention a systematic way of finding reciprocals.) 

Knowing that c = [mi]~^ in Z^^ , and given a residue list (iti, 1(2), we can now define 

V2 = {U2 - tti)c% 777-2 

and get the corresponding integer modulo M as 

u = V2mi + ui . 

Proof: Obviously u % mi = ui . 

u%m2 = [{u2 - ui)cmi % 1712] + [ui % 1712] 
= [iu2 - ui) %m2] + [ui %m2] 

= U2 % 1712 ■ 

Finally, one can check that < u < M ~ mim2 . 

Example: In Zg^ consider (6,4). Since c = 3 in this case, V2 = —6% 7 = 1. Thus 
it = lxl2 + 6 = 18. This agrees with our table (F^^, major 3rd). 

This method can be iterated for more than two moduli. One needs first to find the 
reciprocal of each modulus with respect to each later modulus in the list: 

Cjkrrij = 1 modulo mk ■ 

Start the algorithm by setting (fi, . . . , vr) — (wi, . . . , ur). Then at step j for 1 < j < R, 

replace Vk by {vk - Vj)cjk%mk 

for j < k < R. Finally, 

u = vi + V2mi + vzm2mi H h vrtjir-i ■■■mi. (*) 
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I leave it as a big exercise to show in this way that 7! = 5040. 

Of course, when this process is carried out in a real problem, u and some of the terms 
in (*) must be expected to exceed the word size of the machine. It may be that at this point 
only the rough magnitude of u is needed, and a conversion of the residue representation 
into a floating-point number, rather than an exact integer, will suffice. Davin wrote a C++ 
function to do that. 

This is a good place to mention the drawbacks of modular arithmetic. Although a 
modular representation is very convenient for addition, subtraction, and multiplication, 
it is not good at all for division, or for telling which of two integers is the larger, or for 
testing whether the integer is about to grow outside the allowed range set by the product 
of the moduli, M. (These last two points are especially severe when the range to M — 1 
is used, since it may be impossible to predict whether the result of a subtraction will be 
a negative number, which is not allowed.) To perform these operations one almost has to 
convert back to the form (*). (See, however, [7] and [8].) 

Another problem is that for a given underlying word size (say 2^^), there is an upper 
limit on the integers that can be represented, since there is a largest M that can be 
obtained as a product of prime (or relatively prime) numbers less than the word size. This 
limitation is seldom important in practice. 

The extended Euclidean algorithm One hole left in our description of the method is 
an algorithm for finding the reciprocals Cjk ■ An example of how this can be done appears 
in [4] on p. 649 (Example 14.13). The point is that the Euclidean algorithm for finding 
the greatest common divisor of mj and ruk yields as byproducts two integers a and b such 
that 

omi + bm2 = gcd(mi, 1712) [ = 1, since the mj are relatively prime]. 

It follows that [a] = [mi]~^ in and [b] = [m2]~^ in Z^^ . 

The "extended Euclidean algorithm" (which keeps track of the numbers a and b as 
well as the gcd) is extensively discussed by Knuth at the beginning of his first volume [5] 
as an example of the proof of validity of an algorithm by mathematical induction. It is 
implemented in [9]. 

Bibliographical remarks The primary research citation on the practical use of mod- 
ular arithmetic for large numbers is [1]. The method is extensively discussed by Knuth 
in his second volume [6] ; that book is my primary source for the Chinese remainder theo- 
rem and its proof. Szabo and Tanaka [10] also studied computer applications of modular 
arithmetic, but they were more interested in working with small moduli and building the 
modular algorithms into the hardware. Davis and Hersh [3] give a readable introduction to 
the Chinese remainder theorem consisting primarily of interesting historical commentary. 
Grimaldi [4] provides a standard textbook account of residue arithmetic. 

Acknowledgments. I thank Bob Blakley and Jack Borosh for introducing me to the residue 
method for handling large integers, and Davin Potts for my most successful student collaboration 
yet (and for comments on the manuscript). 
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